Incremental plan aggregation for generating policies in MDPs
نویسندگان
چکیده
Despite the recent advances in planning with MDPs, the problem of generating good policies is still hard. This paper describes a way to generate policies in MDPs by (1) determinizing the given MDP model into a classical planning problem; (2) building partial policies off-line by producing solution plans to the classical planning problem and incrementally aggregating them into a policy, and (3) using sequential Monte-Carlo (MC) simulations of the partial policies before execution, in order to assess the probability of replanning for a policy during execution. The objective of this approach is to quickly generate policies whose probability of replanning is low and below a given threshold. We describe our planner RFF, which incorporates the above ideas. We present theorems showing the termination, soundness and completeness properties of RFF. RFF was the winner of the fully-observable probabilistic track in the 2008 International Planning Competition (IPC-08). In addition to our analyses of the IPC08 results, we analyzed RFF’s performance with different plan aggregation and determinization strategies, with varying amount of MC sampling, and with varying threshold values for probability of replanning. The results of these experiments revealed how they impact the time performance of RFF to generate solution policies and the quality of those solution policies (i.e., the average accumulated reward gathered from the execution of the policies).
منابع مشابه
Incremental Structure Learning in Factored MDPs with Continuous States and Actions
Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...
متن کاملPrioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning
We describe an approach to goal decomposition for a certain class of Markov decision processes (MDPs). An abstraction mechanism is used to generate abstract MDPs associated with different objectives, and several methods for merging the policies for these different objectives are considered. In one technique, causal (least-commitment) structures are generated for abstract policies and plan mergi...
متن کاملHierarchical Control and Learning for Markov Decision Processes Abstract Hierarchical Control and Learning for Markov Decision Processes
This dissertation investigates the use of hierarchy and problem decomposition as a means of solving large, stochastic, sequential decision problems. These problems are framed as Markov decision problems (MDPs). The new technical content of this dissertation begins with a discussion of the concept of temporal abstraction. Temporal abstraction is shown to be equivalent to the transformation of a ...
متن کاملFinite-Horizon Markov Decision Processes with State Constraints
Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...
متن کاملExtreme State Aggregation beyond MDPs
We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp. MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately...
متن کامل